- Article
Machine Learning Calibration of Smartphone-Based Infrared Thermal Cameras: Improved Bias and Persistent Random Error
- Jayroop Ramesh,
- Tom Loney and
- Thomas Boillat
- + 4 authors
Low-cost, smartphone-based thermal cameras offer unprecedented accessibility for physiological monitoring, yet their validity and reliability for absolute skin temperature measurement in clinical settings remain contentious. This study aims to quantify the agreement and repeatability of a widely used smartphone thermal camera, the FLIR One Pro, against a consumer-grade, non-contact infrared thermometer, the iHealth PT3. A method comparison study was conducted with 40 healthy adult participants, yielding a total of 2400 temperature measurements. Skin temperature of the hand dorsum was measured concurrently with the FLIR One Pro and the iHealth PT3. The protocol involved two rounds: Round 1 (R1) in a stable, static environment to assess baseline repeatability, and Round 2 (R2) in a dynamic environment mimicking clinical repositioning. The performance of the instruments was compared using paired t-tests for mean differences and Bland–Altman analysis for assessing agreement. The iHealth PT3 demonstrated superior precision, with an average intra-participant standard deviation (SD) of 0.030 °C in R1 and 0.092 °C in R2. In stark contrast, the FLIR One Pro exhibited significantly higher variability, with an average SD of 0.34 °C in R1 and 0.30 °C in R2. Bland–Altman analysis revealed a substantial mean bias of −1.42 °C in R1 and −1.15 °C, with critically wide 95% limits of agreement ranges of ≈6 °C. The substantial systematic bias and poor agreement of the FLIR One Pro far exceed both its manufacturer-stated accuracy and clinically acceptable error margins for absolute temperature measurement. To further examine whether calibration could mitigate these deficiencies, we applied a suite of ten machine learning regressors to map FLIR readings onto iHealth PT3 values. Calibration reduced systematic bias across all models, with Quantile Gradient-Boosted Regression Trees achieving the lowest MAE (1.162 °C). The Extra Trees model yielded the lowest RMSE (1.792 °C) and the highest explained variance ( = 0.152), yet this relatively low value confirms that the device’s high intrinsic variability limits the effectiveness of algorithmic correction. As such the device has limited utility for longitudinal patient monitoring or for diagnostic decisions that rely on precise, absolute temperature thresholds. These findings inform medical practitioners in low-resource settings of the profound limitations of using this device as a standalone clinical thermometer and emphasize that algorithmic correction cannot compensate for fundamental hardware and measurement noise constraints.
17 February 2026









